MapReduce/Bigtable for Distributed Optimization

نویسندگان

  • Keith B. Hall
  • Scott Gilpin
  • Gideon Mann
چکیده

With large data sets, it can be time consuming to run gradient based optimization, for example to minimize the log-likelihood for maximum entropy models. Distributed methods are therefore appealing and a number of distributed gradient optimization strategies have been proposed including: distributed gradient, asynchronous updates, and iterative parameter mixtures. In this paper, we evaluate these various strategies with regards to their accuracy and speed over MapReduce/Bigtable and discuss techniques and configurations needed for high performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Motivating a Distributed System of Commodity Machines1

This report examines the price/performance benefit of using a large cluster of commodity machines rather than server level hardware for certain large scale software applications. A number of tools are presented which make it easier to produce software that runs across large clusters of commodity machines. These tools are the Chubby locking service, the Google file system, MapReduce and BigTable...

متن کامل

Hbase - non SQL Database, Performances Evaluation

HBase is the open source version of BigTable distributed storage system developed by Google for the management of large volume of structured data. HBase emulates most of the functionalities provided by BigTable. Like most non SQL database systems, HBase is written in Java. The current work’s purpose is to evaluate the performances of the HBase implementation in comparison with SQL database, and...

متن کامل

Fast Multi-fields Query Processing in Bigtable Based Cloud Systems

With the rapid increase of data sizes, enterprise applications are migrating their backend data management and analytic systems into cloud based data management systems.Bigtable is among one of the major data models used by cloud storage systems as their storage layer. Such systems provide high scalability and schema flexibility, and support efficient point and range based queries based on rowk...

متن کامل

DISTRIBUTED SYSTEMS B534 SURVEY PAPER The Chubby Lock Service

This is a survey paper written for the class B534, Distributed Systems. The purpose of this paper is to encourage us to learn new things about how distributed systems come to work in reality and how they are actually evaluated and applied. The topic which is to be presented in this paper is the Chubby Lock service that is implemented by Google and is part of Google Labs. An important point to t...

متن کامل

How Big Hadoop Clusters Break in the Real World

Hadoop is among today’s most widely deployed “big data” systems. Cloudera is a company offering paid Hadoop services and support. This poster abstract describes lessons from examining a sample of 293 support tickets, from February through July of 2011. We manually labelled the tickets in our sample with the established root cause and the specific system component being worked on. Tickets cover ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010